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Abstract 

Path diversity works by setting up multiple parallel connections between the end points using the topological 
path redundancy of the network. In this paper, Forward Error Correction (FEC) is applied across multiple indepen- 
dent paths to enhance the end-to-end reliability. Network paths are modeled as erasure Gilbert-Elliot channels [1]- 
[5]. It is known that over any erasure channel, Maximum Distance Separable (MDS) codes achieve the minimum 
probability of irrecoverable loss among all block codes of the same size [6], [7]. Based on the adopted model for 
the error behavior, we prove that the probability of irrecoverable loss for MDS codes decays exponentially for an 
asymptotically large number of paths. Then, optimal rate allocation problem is solved for the asymptotic case where 
the number of paths is large. Moreover, it is shown that in such asymptotically optimal rate allocation, each path is 
assigned a positive rate iff its quality is above a certain threshold. The quality of a path is defined as the percentage 
of the time it spends in the bad state. Finally, using dynamic programming, a heuristic suboptimal algorithm with 
polynomial runtime is proposed for rate allocation over a finite number of paths. This algorithm converges to the 
asymptotically optimal rate allocation when the number of paths is large. The simulation results show that the 
proposed algorithm approximates the optimal rate allocation (found by exhaustive search) very closely for practical 
number ofpaths, and provides significant performance improvement compared to the alternative schemes of rate 
allocation^ 

Index Terms 

Path diversity, Internet, MDS codes, erasure, forward error correction, rate allocation, complexity. 

I. Introduction 

IN recent years, path diversity over the Internet has received significant attention. It has been shown 
that path diversity has the ability to simultaneously improve the end-to-end rate and reliability [3], 
[8]— [10] . In a dense network like the Internet, it is usually possible to find multiple independent paths 
between most pairs of nodes [1 1]— [16]. A set of paths are defined to be independent if their corresponding 
packet loss and delay characteristics are independent. Clearly, disjoint paths would be independent too [3], 
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[4], [8], [11], [12], [17]— [19]. Even when the paths are not completely disjoint, their loss and delay patterns 
may show a high degree of independence as long as the nodes and links they share are not congestion 
points or bottlenecks [3], [11], [12], [14], [16]— [19]. In this paper, Forward Error Correction (FEC) is 
applied across multiple independent paths. Based on this model, we show that path diversity significantly 
enhances the performance of FEC. 

In order to apply path diversity over any packet switched network, two problems need to be addressed: 
i) setting up multiple independent paths between the end-nodes, ii) utilizing the given independent paths 
to improve the end-to-end throughput and/or reliability. In this paper, we focus on the second problem 
only. However, it should be noted that the first problem has also received significant attention in the 
literature (see [8], [11], [12], [16], [19]-[26]). In case the end-points have enough control over the path 
selection process, the centralized and distributed algorithms in references [27] and [28] can be used to find 
multiple disjoint paths over a large connected graph. However, applying such algorithms over the Internet 
requires modification of IP routing protocol and extra signaling between the nodes (routers). Of course, 
modifying the traditional IP network is extremely costly. To avoid such an expense, overlay networks 
are introduced [16], [19], [29]. The basic idea of overlay networks is to equip very few nodes (smart 
nodes) with the desired new functionalities while the rest remain unchanged. The smart nodes form a 
virtual network connected through virtual or logical links on top of the actual network. Thus, overlay 
nodes can be used as relays to set up independent paths between the end nodes [22], [24]-[26], [30]. 
Han et. al have experimentally studied the number of available disjoint paths in the Internet using overlay 
networks [11]. They have also discussed the impact of network path diversity on the performance of overlay 
networks [12], [21]. Reference [20] addresses the problem of distributed overlay network design based 
on a game theoretical approach. Many other researchers have tried to optimize the design of overlay 
networks such that they offer the maximum degree of path diversity [22], [25], [26], [30]. Moreover, 
the idea of multihoming is proposed to set up extra independent paths between the end-points [23], 
[24]. In this technique, the end users are connected to more than one Internet Service Providers (ISP's) 
simultaneously. It is shown that combining multihoming with overlay assisted routing can improve the 
end-to-end performance considerably [24] . In the cases where the backbone network partially consists of 
optical links between the nodes, each optical fiber conveys tens of independent channels (tones). There 
has been efforts to take advantage of this inherent physical layer diversity in optical networks [30]. 

Recently, path diversity is utilized in many applications (see [4], [3 1]— [34]). Reference [32] combines 
multiple description coding and path diversity to improve quality of service (QoS) in video streaming. 
Packet scheduling over multiple paths is addressed in [35] to optimize the rate-distortion function of 
a video stream. Reference [34] utilizes path diversity to improve the quality of Voice over IP streams. 
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According to [34], sending some redundant voice packets through an extra path helps the receiver buffer 
and the scheduler optimize the trade-off between the maximum tolerable delay and the packet loss 
ratio [34]. In [8], multipath routing of TCP packets is applied to control the congestion with minimum 
signaling overhead. Content Distribution Networks (CDN's) can also take advantage of path diversity 
for performance improvement. CDN's are a special type of overlay networks consisting of Edge Servers 
(nodes) responsible for delivery of the contents from an original server to the end users [29], [36]. Current 
commercial CDN's like Akamai use path diversity based techniques like SureRoute to ensure that the edge 
servers maintain reliable connections to the original server. Video server selection schemes are discussed 
in [22] to maximize path diversity in CDN's. 

Moreover, references [9] and [3] study the problem of rate allocation over multiple paths. Assuming 
each path follows the leaky bucket model, reference [9] shows that a water-filling scheme provides the 
minimum end-to-end delay. On the other hand, reference [3] considers a scenario of multiple senders and 
a single receiver, assuming all the senders share the same source of data. The connection between each 
sender and the receiver is assumed to follow the Gilbert-Elliot model. They propose a receiver-driven 
protocol for packet partitioning and rate allocation. The packet partitioning algorithm ensures no sender 
sends the same packet, while the rate allocation algorithm minimizes the probability of irrecoverable 
loss in the FEC scheme [3]. They only address the rate allocation problem for the case of two paths. 
A brute-force search algorithm is proposed in [3] to solve the problem. Generalization of this algorithm 
over multiple paths results in an exponential complexity in terms of the number of paths. Moreover, it 
should be noted that the scenario of [3] is equivalent, without any loss of generality, to the case in which 
multiple independent paths connect a pair of end-nodes as they assume the senders share the same data. 

Maximum Distance Separable (MDS) codes have been shown to be optimum in the sense that they 
achieve the maximum possible minimum distance (d m i n ) among all the block codes of the same size [37]. 
Indeed, any [N, K] MDS code (with block length N and K information symbols) can be successfully 
recovered from any subset of its entries of length K or more. This property makes MDS codes favorable 
FEC schemes over the erasure channels like the Internet [38]-[40]. However, the simple and practical 
encoding-decoding algorithms for such codes have quadratic time complexity in terms of the code size [41]. 
Theoretically, more efficient (O (iVlog 2 (N))) MDS codes can be constructed based on evaluating and 
interpolating polynomials over specially chosen finite fields using Discrete Fourier Transform [42], but 
these methods are not competitive in practice with the simpler quadratic methods except for extremely 
large block sizes. Recently, a family of almost-MDS codes with low encoding-decoding time complexity 
(linear in term of the code length) is proposed and shown to be practical over the erasure channels like 
the Internet [43], [44]. In these codes, any subset of symbols of size K(l + e) is sufficient to recover the 
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original K symbols with high probability [44]. 

MDS codes also require alphabets of a large size. Indeed, all the known MDS codes have alphabet sizes 
growing at least linearly with the block length N. There is a conjecture stating that all the [N, K) MDS 
codes over the Galois field ¥ q with 1 < K < N — 1 have the property that N < q + 1 with two exceptions 
[37]. However, this is not an issue in the practical networking applications since the alphabet size is 
q = 2 r where r is the packet size, i.e. the block size is much smaller than the alphabet size. Algebraic 
computation over Galois fields (F 9 ) of such cardinalities is now practically possible with the increasing 
processing power of electronic circuits. Note that network coding schemes, recently proposed and applied 
for content distribution over large networks, have a comparable computational complexity [45]-[47]. 

In this work, we utilize path diversity to improve the performance of FEC between two end-nodes over 
a general packet switched network like the Internet. The details of path setup process is not discussed 
here. More precisely, it is assumed that L independent paths are set up by a smart overlay network or 
any other means [8], [11], [12], [16], [18]— [26]. Each path is modeled by a two-state continuous time 
Markov process called Gilbert-Elliot channel [l]-[5]. Probability of irrecoverable loss (Pe) is defined as 
the measure of FEC performance. It is known that MDS block codes have the minimum probability of error 
over our End-to-End Channel model, and over any other erasure channel with or without memory [6], 
[7]. Applying MDS codes, our analysis shows an exponential decay of Pe with respect to L for the 
asymptotic case where the number of paths is large. Of course, in many practical cases, the number of 
disjoint or independent paths between the end nodes is limitted. However, in our asymptotic analysis, 
we have assumed that it is possible to find L independent paths between the end points even when L is 
large. Moreover, the optimal rate allocation problem is solved in the asymptotic case. It is seen that in 
the asymptotically optimal rate allocation, each path is assigned a positive rate iff its quality is above a 
certain threshold. Quality of a path is defined as the percentage of the time it spends in the bad state. 
Furthermore, using dynamic programming, a heuristic suboptimal algorithm is proposed for rate allocation 
over a finite number of paths (limitted L). Unlike the brute-force search, this algorithm has a polynomial 
complexity, in terms of the number of paths. It is shown that the result of this algorithm converges to the 
asymptotically optimal solution for large number of paths. Finally, the proposed algorithm is simulated 
and compared with the optimal rate allocation found by exhaustive search for practical number of paths. 
Simulation results verify the near-optimal performance of the proposed suboptimal algorithm in practical 
scenarios. 

The rest of this paper is organized as follows. Section [II] describes the system model. Probability 
distribution of the bad burst duration is discussed in section [Till Performance of FEC in three cases 
of a single path, multiple identical paths, and non-identical paths are analyzed in section |IVJ Section |V] 
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Fig. 1. Continuous-time two-state Markov model of the end-to-end channel 

studies the rate allocation problem, and proposes a suboptimal rate allocation algorithm. Finally, section [VTI 
concludes the paper. 

II. System Modeling and Formulation 

A. End-to-End Channel Model 

From an end to end protocol's perspective, performance of the lower layers in the protocol stack can be 
modeled as a random channel called the end-to-end channel. Since each packet usually includes an internal 
error detection coding (for instance a Cyclic Redundancy Check), the end-to-end channel is satisfactorily 
modeled as an erasure channel. Delay of the end-to-end channel is strongly dependent on its packet loss 
pattern, and affects the QoS considerably [48], [49]. 

In this work, the model assumed for the end-to-end channel is a two-state Markov model called Gilbert- 
Elliot cell, depicted in Fig. [TJ The channel spends an exponentially distributed random amount of time 
with the mean ^- in the Good state. Then, it alternates to the Bad state and stays in that state for another 
random duration exponentially distributed with the mean --k It is assumed that the channel state does not 
change during the transmission of a given packet [4], [50], [51]. Hence, if a packet is transmitted from 
the source at anytime during the good state, it will be received correctly. Otherwise, if it is transmitted 
during the bad state, it will eventually be lost before reaching the destination. Therefore, the average 
probability of error is equal to the steady state probability of being in the bad state, 7T{, = - . To have 
a reasonably low probability of error, fi g must be much smaller than //&. This model is widely used in 
the literature for theoretical analysis where delay is not a significant factor [l]-[5], [50]-[52]. Despite its 
simplicity, this model satisfactorily captures the bursty error characteristic of the end-to-end channel. More 
comprehensive models like the hidden Markov model are introduced in [49], [53]. Although analytically 
cumbersome, such models express the dependency of loss and delay more accurately. 

B. Typical FEC Model 

A concatenated coding is used for packet transmission. The coding inside each packet can be a simple 
Cyclic Redundancy Check (CRC) which enables the receiver to detect an error inside each packet. Then, 
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Fig. 2. Rate allocation problem: a block of N packets is being sent from the source to the destination through L independent paths over 
the network during the time interval T with the required rate S req = jr- The block is distributed over the paths according to the vector 



N = (JVi, . . . , Nl) which corresponds to the rate allocation vector S = (Si, 



,Sl) 



the receiver can consider the end-to-end channel as an erasure channel. Other than the coding inside 
each packet, a Forward Error Correction (FEC) scheme is applied between packets. Every K packets are 
encoded to a Block of N packets where N > K to create some redundancy. The N packets of each block 
are distributed across the L available independent paths, and are received at the destination with some 
loss (erasure). The ratio of a — defines the FEC overhead. A Maximum Distance Separable (MDS) 
[N, K] code, such as the Reed-Solomon code, can reconstruct the original K data packets at the receiver 
side if K or more of the N packets are received correctly [54]. According to the following theorem, an 
MDS code is the optimum block code we can design over any erasure channel. Although FEC imposes 
some bandwidth overhead, it might be the only option when feedback and retransmission are not feasible 
or fast enough to provide the desirable QoS. 

Definition I. An erasure channel is defined as the one which maps every input symbol to either itself 
or to an erasure symbol £. More accurately, an arbitrary channel (memoryless or with memory) with the 
input vector x e X N , \X\ = q , the output vector y G (X U {£1)^, and the transition probability p (y|x) 
is defined to be erasure iff it satisfies the following conditions: 

1) p(Vit{xj,t}\xi) = 0, Vj- 

2) Defining the erasure identifier vector e as 

/ i vj = e 

l otherwise 

p(e|x) is independent of x. 
Theorem I. A block code of size [N, K] with equiprobable codewords over an arbitrary erasure channel 
(memoryless or with memory) has the minimum probability of error (assuming optimum, i.e., maximum 
likelihood decoding) among all block codes of the same size if that code is Maximum Distance Separable 
(MDS). The proof is given in [6], [7]. 
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C. Rate Allocation Problem 

The network is modeled as follows. L independent paths, 1,2, ... ,L, connect the source to the des- 
tination, as indicated in Fig. |2(a)| Information bits are transmitted as packets, each of a constant length 
r. Furthermore, there is a constraint on the maximum rate for each path, meaning that the i'th path can 
support a maximum rate of Wi packets per second. This constraint can be considered as an upperbound 
imposed by the physical characteristics of the path. As an example, [55] introduces the concept of the 
maximum TCP-friendly bandwidth for the maximum capacity of an Internet path. Wis are assumed to 
be known at the transmitter side. For a specific application and FEC scheme, we require a rate of S req 
packets per second from the source to the destination. Obviously, we should have S req < J2i=i Wi to 
have a feasible solution. The information packets are assumed to be coded in blocks of length iV packets. 
Hence, it takes T = -J^— seconds to transmit a block of packets. In practical scenarios with finite number 
of paths, the end-to-end required rate (S reg ) is given, and the values of N and T have to be chosen based 
on the feasible complexity of the MDS decoder and the delay constraint of the application, respectively. 

According to the FEC model, we can send iVj packets through the path i as long as ^2*=i N% = N 
and <Wi. The rate assigned to path i can be expressed as Si = = jfS req , since the transmission 
instants of the Ni packets are distributed evenly over the block duration T (see Fig. |2(b)[ ). Obviously, we 
have J2i=i &i = Sreq- The objective of rate allocation problem is to find the optimal rate allocation vector 
or the vector N = (Ni, ■ ■ ■ , N L ) which minimizes the probability of irrecoverable loss (Pb). 

The above formulation of rate allocation problem is valid for any finite number of paths and any chosen 
values of iV and T. However, in section [IV] where the performance of path diversity is studied for a large 
number of paths, and also in Theorem III where the optimality of the proposed suboptimal algorithm is 
proved for the asymptotic case, we assume that N grows linearly in terms of the number of paths, i.e. 
N = n L, for a fixed n . The reason behind this assumption is that when L grows asymptotically large, 
the number of paths eventually exceeds the block length, if N stays fixed. Thus, L — N paths become 
useless for the values of N larger than N. At the same time, it is assumed that the delay imposed by 
FEC, T, stays fixed with respect to L. This model results in a linearly increasing rate as the number of 
paths grows. We will later show that utilizing multiple paths, it is possible to simultaneously achieve an 
exponential decay in Pe and a linear increase in rate, while the delay stays constant. 

In this work, an irrecoverable loss is defined as the event where more than N — K packets are lost in 
a block of N packets. P E denotes the probability of this event. It should be noted that this probability 
is different from the decoding error probability of a maximum likelihood decoder performed on an MDS 
[N, K] code, denoted by P{£}. Theoretically, an optimum maximum likelihood decoder of an MDS code 
may still decode the original codeword correctly with a positive, but very small probability, if it receives 



less than K symbols (packets). More precisely, such a decoder is able to correctly decode an MDS code 
over ¥ q with the probability of \ after receiving K — i correct symbols (see the proof of Theorem I 
in [6], [7] for more details). Of course, for Galois fields with a large cardinality, this probability is usually 
negligible. The relationship between Pe and ¥{£} can be summarized as follows: 

F{K — i Packets received correctly} 
F{£\ = Pe-}^ ~i 

i=l H 
K 



> Pe F{K — i Packets received correctly} 



« i=i 

Pe ( 1 - - q ) ■ (1) 



Pe [!--)< P{£} < Pe- (2) 



Hence, P{£} is bounded as 

Pe (l- 

The reason Pe is used as the measure of system performance is that while many practical low-complexity 
decoders for MDS codes work perfectly if the number of correctly received symbols is at least K, their 
probability of correct decoding is much less than that of maximum likelihood decoders when the number 
of correctly received symbols is less than K [54]. Thus, in the rest of this paper, Pe is used as a close 
approximation of decoding error. 



III. Probability Distribution of Bad Bursts 

The continuous random variable Bt is defined as the duration of time that the path i spends in the bad 
state in a block duration, T. We denote the values of B>i with parameter t to emphasize that they are 
expressed in the unit of time. In this section, we focus on one path, for example path 1. Therefore, the 
index i can be temporarily dropped in analyzing the probability distribution function (pdf) of B>i. 

We define the events g and b, respectively, as the channel being in the good or bad states at the start 
of a block. Then, the distribution of B can be written as 

/b(*) = fB\b(t)n b + f B \ a Kg. (3) 

To proceed further, two assumptions are made. First, it is assumed that ir g ^> Hb or equivalently ^- 3> — . 
This condition is valid for a channel with a reasonable quality. Besides, the block time T is assumed to 
be much shorter than the average good state duration i.e. 1 ^> fJ, g T, such that T can contain either 
none or a single interval of bad burst (see [1], [3], [4] for justification). More precisely, the probability 
of having at least two bad bursts is negligible compared to the probability of having exactly one bad 
burst. However, it should be noted that all the results of this paper except subsection IIV-AI remain valid 
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Fig. 3. A bad burst of duration Bi happens in a block of length T. Ei = 3 packets are corrupted or lost during the interval Bi. Packets 
are transmitted every seconds, where Si is the rate of path i in pkt/sec. 

regardless of these two assumptions. Of course, in that case, the exact probability distribution function of 
Bi should be used instead of the approximation used here (refer to Remark I in subsection IIV-BI) . 
Hence, the pdf of B conditioned on the event b can be approximated as 

f B \b{t) = ^e-"»* + S(t - T)e-^ T (4) 

where 5(u) is the Dirac delta function. (HJ) follows from the memory less nature of the exponential 
distribution, the assumption that T contains at most one bad burst, and the fact that any bad burst longer 
than T has to be truncated at B = T. 
To compute fB\g(t), we have 

f Blg (t) = F{B = 0\g}5(t) - ^-F{B > t\g} (5) 

where 

¥{B = 0\g} = e~^ T w 1 - fi g T (6) 

and 

F{B > t\g} ( = } (1 - e -M T -*)) e -w< ~ N (T - t)e- nt (7) 

where (a) results from the fact that {B > t\g} is equivalent to the initial good burst being shorter than 
T — t, and the following bad burst larger than t, and the duration T containing at most one bad burst. 
Now, combining @>, ©, ©, and CD), /b(£) can be computed. 

A. Discrete to Continuous Approximation 

To compute the probability of irrecoverable loss (Pg), we have to find the probability of ki packets 
being lost out of the iVj packets transmitted through the path i, for i from 1 to L and fcj from to 
iVj. Let us denote the number of erroneous or lost packets over the path i with the random variable 
Any two subsequent packets transmitted over the path i are 4- seconds apart in time, where Si is the 
transmission rate over the z'th path. We observe that the probability F{Ei > k^} can be approximated with 
the continuous counterpart P{-Bj > f 1 } when the inter-packet interval is much shorter than the typical 
bad burst (4- — , or equivalently a b Si). The necessity of this condition can be intuitively justified 
as follows. In case this condition does not hold, any two consecutive packets have to be transmitted 
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Fig. 4. Probability of irrecoverable loss versus /if,T for one path with fixed /i s , T and a. 



on two independent states of the channel. Thus, no gain would be achieved by applying diversity over 
multiple independent paths. Figure [3] shows an example of this approximation in detail. The continuous 
approximation simplifies the mathematical analysis as discussed in section | 



IV. Performance Analysis of FEC on Multiple Paths 

Assume that a rate allocation algorithm assigns iVj packets to the path i. According to the discrete to 
continuous approximation in subsection IIII-AL when the Ni packets of the FEC block are sent over path 
i, the loss count can be written as Tp-iVj. Hence, the total ratio of lost packets is equal to 



i=i 



T 

where pi = ^p-, < pi < 1, denotes the portion of the bandwidth assigned to path i. X{ = ^ is defined 
as the portion of time that path i has been in the bad state (0 < Xj < 1). Hence, the probability of 
irrecoverable loss for an MDS code is equal to 



Xj > a 



(8) 



i=l 



where a = N N K . In order to find the optimum rate allocation, Pe has to be minimized with respect to 
the allocation vector (p/s), subject to the following constraints: 

Wi 



0<pi<minjl,— — 1, E t L =iP* 



(9) 



where Wi is the bandwidth constraint on path i defined in subsection III-CI Note that since a^'s are 
proportional to Bi's, their pdf can be easily computed based on the pdf of Bi's. 
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A. Performance of FEC on a Single Path 



Probability of irrecoverable loss for one path is equal to 



P E = F{B > aT} = F{B > aT\b}n b + F{B > aT\g}n g 



where F{B > aT\b} and F{B > aT\g} can be computed as 



F{B>aT\b} = r aT f B \b{t)dt =e"^ T , 
¥{B>aT\g} = f aT f B \ g {t)dt = /i g (l - a)Te~ 



when the assumptions in section [TIT] and equations © and © are used. Thus, we have 



Pe = 7r 6 e-^ aT (l + // 6 (l-a)T) 
§ — + (!-«) T fi g e-^ aT 



(10) 



where (a) follows from the assumption that the end-to-end channel has a low probability of error (i ^> 
Mb-*' 

As we observe, for large values of fi b T, P E decays exponentially with /i b T. Figure |4] shows the results 
of simulating a typical scenario of streaming data between two end-points with the rate S req = 1000^, 
the block length N = 200, and the number of information packets K = 180. These values result in a 
block transmission time of T = 200ms. The average good burst of the end-to-end channel, is selected 
such that [i g T = |. However, the average bad burst, fi b , varies such that fi b T varies from 8 to 40, in 
accordance with the values in [3], [4]. The slope of the best linear fit (in semilog scale) to the simulation 
points is 0.097 which is in accordance with the value of 0.100, resulted from the theoretical approximation 
in CGB. 

B. Identical Paths 

When the paths are identical and have equal bandwidth constraints^ (Wi = W for V 1 < i < L), due 
to the symmetry of the problem, the uniform rate allocation (p« = j) is obviously the optimum solution. 
Of course, the solution is feasible only when we have j- < Then, the probability of irrecoverable 
loss can be simplified as 



Let us define Q(x) as the probability distribution function of x. Since x is defined as i = |, clearly we 
have Q(x) = T/b(xT). Defining E{} as the expected value operator throughout this paper, E{x} can be 

2 The case where WVs are different is discussed in Remark V of subsection IIV-CI 
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computed based on Q(x). We observe that in (fTTI) . the random variable Xj's are bounded and independent. 
Hence, the following well-known upperbound in large deviation theory [56] can be applied 

P E < e~ u{a)L 

{0 for a < E{x] 

(12) 
\a — log(E{e }) otherwise 

where the log function is computed in Neperian base, and A is the solution of the following non-linear 

equation, which is shown to be unique by Lemma I. 

a = i } r . / . (13) 

E{e Xx } v 

Since A is unique, we can define 1(a) = A. Even though being an upperbound, inequality (fT2)) is 
exponentially tight for large values of L [56]. More precisely 

p E = e - u{a)L (14) 

log P E 

where the notation = means lim = u(a). Now, we state two useful lemmas whose proofs can 

L— >oo L 

be found in the appendices [A] and [Bj 

Lemma I. u(a) and 1(a) have the following properties: 

1) £l(a) > 

2) l( a = 0) = -oo 

3) l(a = E{x}) = 

4) I (a = 1) = +oo 

5) ^u(a) = 1(a) > for a > E{x} 

Lemma II. Defining y = j;J2i=i x i> where a^'s are i.i.d. random variables as already defined, the 
probability density function of y satisfies f y (a) = e~ u ^ L , for all a > E{x}. 

Figure |5] compares the theoretical and simulation results. We assume the block transmission time is 
T = 200ms. The block length is proportional to the number of paths as N = 20L. The average good 
burst of the end-to-end channel, is selected such that /i 5 T = |. The end-to-end channel has the error 
probability of = 0.015. Coding overhead is changed from a = 0.05 to a = 0.2. The probability 



of irrecoverable loss is plotted versus the number of paths, L, in semilogarithmic scale in Fig. |5(a)| for 
different values of a. We observe that as L increases, log Pe decays linearly which is expected noting 
equation (fT2]> . Also, Fig. |5(b)| compares the slope of each plot in Fig. |5(a)| with u(a). Figure [5] shows a 



good agreement between the theory and the simulation results, and also verifies the fact that the stronger 
the FEC code is (larger a), the higher is the gain we achieve through path diversity (larger exponent). 

Remark I. Equation (fT4)) is a direct result of the discrete to continuous approximation in subsec- 
tion IIII- A I Therefore, it remains valid even if the other approximations in section [III] do not hold. For 
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example, if the block time contains more than one bad burst, equations (HJ) and © are no longer valid. 
However, equation (fl4l) is still valid as long as the discrete to continuous approximation is used. Of course, 
in this case, the exact distributions of B and x should be used to compute u(a) and A instead of their 
simplified versions. 

Remark II. A special case is when the block code uses all the bandwidth of the paths. In this case, 
we have N = LWT, where W is the maximum bandwidth of each path, and T is the block duration. 
Assuming a > E{x} is a constant independent of L, we observe that the information packet rate is 
equal to = (1 — a) WL, and the error probability is Pe = e~ u ^ L . This shows using MDS codes 
over multiple independent paths provides an exponential decay in the irrecoverable loss probability and a 
linearly growing end-to-end rate in terms of the number of paths, simultaneously. 

C. Non-Identical Paths 

Now, let us assume there are J types of paths between the source and the destination, consisting of Lj 
identical paths of type j (J2j=i % = L). Without loss of generality, we assume that the paths are ordered 
according to their associated type, i.e. the paths from 1 + Yjk=i to Yjl=i are of type j. We denote 
7j = According to the i.i.d. assumption, it is obvious that pi has to be the same for all paths of the 
same type. r}j and yj are defined as 

nz\L k <i<^i =1 L k 



Vj = ~r- V x t . (15) 

Efc=l L k<i<J2{ = ! L k 
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Following Lemma II, we observe that f yj (Pj) = e J J V J . We define the sets 5/, S and S T as 



Si = |(/3i,/3 2 ,---,^)|0<^<1, X>>« 
S = \(Pi,fo,--- ,Pj)\0<Pj<1, J2^ = a 



J 3 

3=1 

St = \(Pi,fo,--- , Pj) \Vj® i x j} < Pj, # 

j'=i 



■j - a 



respectively. Hence, P E can be written as 

Pe = 




-l mm y^TjU,- 

(«) /3G5jU5 ^ 



e 

J 



-l min > 7,-u,- 



e i =1 



P. 



(6) -feso^ IJ J \ Vj 

'P 3 



J 

-l min y^7,-^, , 

= e J_1 
J 



3=1 KVj 



(16) 



where (a) follows from Lemma III, (b) follows from the fact that Uj(a) is a strictly increasing function 
of a, for a > ¥,{xj}, and (c) can be proved as follows. Let us denote the vector which minimizes the 
exponent over the set S as (3 . Since S T is a subset of So, (3 is either in S T or in S — S T . In the former 
case, (c) is obviously valid. When (3 e So — St, we can prove that < /3* < j^E-fa^}, for all 1 < j < J, 
by contradiction. Let us assume the opposite is true, i.e., there is at least one index 1 < j < J such that 
< Pj < rjjE{xj}, and at least one other index 1 < k < J such that ^ fc E{:r fc } < Then, knowing that 
the derivative of of Uj(a) is zero for a = E{ajj} and strictly positive for a > K{xj}, a small increase 
in Pj and an equal decrease in /3£ reduces the objective function, J2j=i lj u j (f 1 )' wn i cn contradicts the 
assumption that (3 is a minimum point. Knowing that < (3* < r]jE{xj}, for all 1 < j < J, it is easy 
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to show that the minimum value of the objective function is zero over So, and St has to be an empty 
set. Defining the minimum value of the positive objective function as zero over an empty set (St) makes 
(c) valid for the latter case where (3 e So — St- Finally, applying Lemma IV results in (d) where f3* is 
defined in the Lemma. 

Lemma III. For any continuous positive function h(x) over a convex set S, and defining H(L) as 



we have 



H(L) = J e- h(x)L dx 



lim = mi /mx) = mm /ilx) 

L^oo L S cl(S) 



where cl(S) denotes the closure of S (refer to [57] for the definition of the closure operator). Proof of 
Lemma III can be found in appendix 

Lemma IV. There exists a unique vector /3* with the elements f3* = rjjlj 1 Orp-j which minimizes the 
convex function J2j=i li u i(^f~) over me convex set St, where v satisfies the following condition 

a. (17) 



denotes the inverse of the function /() defined in subsection HV-Bl Proof of Lemma IV can be found 
in appendix |D] 

Equation (fT6l) is valid for any fixed value of 77. To achieve the most rapid decay of P E , the exponent 
must be maximized over 77. 

lim = max > 7,-u,- — (18) 

where (3* is defined for any value of the vector r\ in Lemma IV. Theorem II solves the maximization 
problem in (fT8l) and identifies the asymptotically optimum rate allocation (for large number of paths). 

Theorem II. Consider a point-to-point connection over the network with L independent paths from the 
source to the destination, each modeled as a Gilbert-Elliot cell, with a large enough bandwidth constraint!. 
The paths are from J different types, Lj paths from the type j. Assume a block FEC of size [N, K) is 
sent during a time interval T. Let Nj denote the number of packets in a block of size N assigned to the 
paths of type j, such that -^Y? = N. The rate allocation vector 77 is defined as r\j = jf. For fixed 

values of j 3 : = -4-, n = jj, k = j-, T and asymptotically large number of paths L, the optimum rate 

3 By the term 'large enough', we mean the bandwidth constraint on a path of type j, Wj, satisfies the condition < Wj. The reason 
is that r\j must satisfy both conditions of < T]j < 1 and = < Wj, simultaneously. When Wj is large enough such that 

< Wj, the latter condition is automatically satisfied, and the optimization problem can be solved. 
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Fig. 6. (a) Pe versus L for the combination of two path types, one third from type I and the rest from type II. (b) The normalized 
aggregated weight of type I paths in the optimal rate allocation (ri° pt ), compared with the value of rji which maximizes the exponent of 
equation i ] 1 St (rj*). 



allocation vector can be found by solving the following optimization problem: 

max g(ri), 
v 

J 

s.t. J2^ = 1 ' °<Vj<1 

3=1 

where g{rj) = Xlj=i7j M i (?r )' an( * ^ 1S an i m P^ c ^ function of ry defined in Lemma IV. The functions 



Uj() and are defined in subsections IIV-BI and HV-Cl Solving the above optimization problem gives the 
unique solution rj* as 

if a < E{ Xj } 







otherwise 



(19) 



^2 iM a ) 

i=l, a>E{xi} 

if there is at least one 1 < j < J for which a > E,{xj}. Otherwise, when a < Ejxj} for all 1 < j < J, 
the maximum value is zero for any arbitrary rate allocation vector, 77. In any case, the maximum value of 
the objective function is g{r}*) = ^2j =1 JjUj(a) which is indeed the exponent of Pe versus L. The proof 
of the theorem can be found in appendix EJ 

Remark III. Theorem II can be interpreted as follows. For large values of L, adding a new type of 
path contributes to the path diversity iff the path satisfies the quality constraint a > E{x}, where x is the 
percentage of time that the path spends in the bad state in the time interval [0, T]. Only in this case, adding 
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the new type of path exponentially improves the performance of the system in terms of the probability 
of irrecoverable loss. 

Remark IV. Observing the exponent coefficient corresponding to the optimum allocation vector 77*, 
we can see that the typical error event occurs when the ratio of the lost packets on all types of paths is 
the same as the total fraction of the lost packets, a. However, this is not the case for any arbitrary rate 
allocation vector 77. 

Remark V. An interesting extension of Theorem II is the case where all types have identical erasure 
patterns (uj(x) = Uk(x) for V 1 < j, k < J and Wx), but different bandwidth constraints. Adopting the 
notation of Theorem II, the bandwidth constraint on r]j can be written as < Wj, where Wj is the 

maximum bandwidth for a path of type j. Let us define 77* as the allocation vector which maximizes the 
objective function of Theorem II (g(rf)), and satisfies the bandwidth constraints too. 77* is also defined as 
the maximizing vector for the unconstrained problem in Theorem II. According to equation (fl~9l ), we have 
77* = 7j for VI < j < J. It is obvious that 77* = 77* if 77* < 73 for all j. In case 77* does not satisfy 
the bandwidth constraint for some j, 77* can be found by the water- filling algorithm. More accurately, we 
have 

\ lEK if «< 7/ r 
H r -f-^^T <20) 

7,-T if 77* < — — — 

where T can be found by imposing the condition ^/=i = ^- Figure El depicts water- filling among 
identical paths with four different bandwidth constraints. Proof of equation (1201) can be found in appendix [0 

Figure |6(a)| shows Pe of the optimum rate allocation versus L for a system consisting of two types 
of path. The optimal rate allocation is found by exhaustive search among all possible allocation vectors. 
The block transmission time is T = 200ms. The block length is proportional to the number of paths as 
iV = 20 L. The average good burst, /i g , is selected such that we have fi g T = | for both types of paths. 
ji — I of the paths (of the first type) benefit from shorter bad bursts and lower error probability of 
n b 1 = 0.015, and the rest (the second type) suffer from longer congestion bursts resulting in a higher 
error probability of 7r b 2 = 0.025. The coding overhead is a = 0.1. The figure depicts a linear behavior in 
semi-logarithmic scale with the exponent of 0.403, which is comparable to 0.389 resulted from (fl~9l ). 

In the scenario of Fig. |6(a)[ let us denote 77^ as the value of of the first element of 77 in equation (Q3- 
Obviously, r/l does not depend on L. Moreover, 77° pi is defined as the normalized aggregated weight of 
type I paths in the optimal rate allocation. Figure [6(b)] compares rf^ 1 with i]\ for different number of paths. 
It is observed that 77° p< converges rapidly to 77^ as L grows. Figure |6(a)| also verifies that the allocation 
vector candidate rf proposed by Theorem II indeed meets the optimal allocation vector for large values 
of L. 
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Fig. 7. WaterFilling algorithm over identical paths with four different bandwidth constraints. 



V. Suboptimal Rate Allocation 

In order to compute the complexity of the rate allocation problem, we focus our attention on the 
original discrete formulation in subsection III-Cl According to the model of subsection IIV-CI we assume 
the available paths are from J types, Lj paths from type j, such that J2j=i Lj = L. Obviously, all the 
paths from the same type should have equal rate. Therefore, the rate allocation problem is turned into 

finding the vector N = (N 1 ,...,N J ) such that J2j=i N j = N > and - N J - L i W i T for a11 3- N i 
denotes the number of packets assigned to all the paths of type j. Let us temporarily assume that all paths 
have enough bandwidth such that Nj can vary from to iV for all j. There are (^j^ 1 ) L-dimensional 
non-negative vectors of the form (N%, . . . , Nj) which satisfy the equation Ylj=i Nj = N each representing 
a distinct rate allocation. Hence, the number of candidates is exponential in terms of J. 

First, we prove the problem of rate allocation is NP [58] in the sense that Pe can be computed in 
polynomial time for any candidate vector N = (Ni, . . . ,Nj). Let us define P e N (/c, j) as the probability 
of having more than k errors over the paths of types 1 to j for a specific allocation vector N. We also 
define Qj(n, k) as the probability of having exactly k errors out of the n packets sent over the paths of 
type j. Qj(n, k) can be computed and stored for all path types and values of n and k with polynomial 
complexity as explained in appendices iGl and [Hi Then, the following recursive formula holds for P e N (fc, j) 

j2QANj,i)pr(k-i,j-i) uk>o 

1 if k < 

P e N (M) = Qi(Ni,i). (21) 

i=k+l 

To compute P^(K, J) by the above recursive formula, we apply a well-known technique in the theory 
of algorithms called memoization [59]. Memoization works by storing the computed values of a recursive 
function in an array. By keeping this array in the memory, memoization avoids recomputing the function for 
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the same arguments when it is called later. To compute P^(K, J), an array of size O(KJ) is required. This 
array should be filled with the values of P e N (fc, j) for < k < K, and 1 < j < J. Computing P^(k, j) 
requires O(K) operations assuming the values of P e N (i,j — 1) and Qj(Nj,i) and Yl!ilk+iQi(Nj,i) are 
already computed for < i < k. Thus, P^(K, J) can be computed with the complexity of 0(K 2 J) if the 
values of Qj(Nj, k) are given for all Nj and < k < K. Following appendix |H] we note that for each 
j, Qj(Nj, k) for < k < K is computed offline with the complexity of 0{K 2 Lj) + O (^k\ Hence, 
the total complexity of computing P e N (if, J) adds up to 



(a) 



(6) 



0=1 
J 



0(K 2 J) + { r2L 3 + N i K ) 



O (K 2 L + KN) (22) 



where (a) follows from the fact that jf- < Nj, and the term 0(K 2 J) is omitted in (b) since we know 
that J < L. 

Now, we propose a suboptimal polynomial time algorithm to estimate the best path allocation vector, 
N op *. Let us define P° pt (n, k,j) as the probability of having more than k errors for a block of length n 
over the paths of types 1 to j minimized over all possible rate allocations (N = N op '). First, we find a 
lowerbound P e (n, k, j) for P° pt (n, k,j) from the following recursive formula 



P e (n,k,j) 



min } Qj(rij,i)- 

(Krij , <min {n, \L 2 ;WjT\ } 

P e {n — rij, k — i,j — 1) if k > 



1 if k < 

n 

P e (n,k,l) = QiM- (23) 

i=k+l 

Using memoization technique, we need an array of size O(NKJ) to store the values of P e (n,k,j) for 
< n < N, < k < K, and 1 < j < J. According to the recursive definition above, computing 
P e (n, k,j) requires 0{NK) operations assuming the values of Qj(rij, i) and P e (n — rij, k — i, j — 1) and 
Y^i=k+i Qj( n ji *) me already computed for all i and nj. Thus, it is easy to verify that P e (N, K, J) can be 
computed with the complexity of 0(N 2 K 2 J) when the values of Qj(rij,i) are given for all < rij < n 
and < i < K. According to appendix |Hj for each 1 < j < J, and for each < n j < N, Qj(rij,i) 
for all < i < rij is computed offline with the complexity of 0{n 2 Lj) + O (j^-n^ = 0{n 2 Lj). Thus, 
computing Qj(rij,i) for all 1 < j < J, and < rij < N, and < i < rij, has the complexity of 
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J2i=iJ2n=i°( n ^ L j) = 0(N 3 L). Finally, P e (N,K,J) can be computed with the total complexity of 
0{N 2 K 2 J + N 3 L). 

The following lemma guarantees that P e (n, k,j) is in fact a lowerbound for P° pt (n, k,j). 
Lemma V. P° pt (n, k,j) > P e (n, k,j). The proof is given in appendix HI 

The following algorithm recursively finds a suboptimum allocation vector N based on the lowerbound 
of Lemma V. 

(1) : Initialize j <— J, n <— N, k <— K. 

(2) : Set 

Nj = argmin y Qj{ n ji i) ' 

0<nj<min{n,[LjWjT}} {=Q 

P e (n-nj,k-i,j - 1) 
Kj = argmaxQ,(Aj, i)P e (n — Nj, k — i, j — 1) 

0<i<Nj 

(3) : Update n <— n — Nj, k <— k — Kj, j <— j — 1. 

(4) : If j > 1 and k > 0, goto (2). 

(5) : For m = 1 to j, set N m <— I — J. 

3 

(6) : Nj Nj + Rem(n, j) where Rem(a,b) denotes the remainder of dividing a by b. 

Intuitively speaking, the above algorithm tries to recursively find the typical error event (Kj's) which 
has the maximum contribution to the error probability, and assigns the rate allocations (iV/s) such that 
the estimated typical error probability (P e ) is minimized. Indeed, Lemma V shows that the estimate used 
in the algorithm (P e ) is a lower-bound for the minimum achievable error probability (P° pt ). Comparing 
(1231) and the step (2) of our algorithm, we observe that the values of Nj and Kj can be found in 0(1) 
during the computation of P e (N, K, J). Hence, complexity of the proposed algorithm is the same as that 
of computing P e (N, K, J), 0(N 2 K 2 J + N 3 L). 

The following theorem guarantees that the output of the above algorithm converges to the asymptotically 
optimal rate allocation introduced in Theorem II of section IIV-CI and accordingly, it performs optimally 
for large number of paths. 

Theorem III. Consider a point-to-point connection over the network with L independent paths from the 
source to the destination, each modeled as a Gilbert-Elliot cell with a large enough bandwidth constraint. 
The paths are from J different types, Lj paths from the type j. Assume a block FEC of the size [N, K] 
is sent during an interval time T. For fixed values of jj — n = j;, k = jj, T and asymptotically 
large number of paths (L) we have 

1) P e (N, K, J) = P° pt {N, K, J) = e" LE /=i 7jUj(a) 
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2 ) f = V- + o(l) 

3) f 1 = a + o(l) for a > Efo-}. 

where a = ^ and are defined in subsections IIV-BI and HV-Cl P e (N, K, J) is the lowerbound for 
P° pt (n, k, j) defined in equation (|23T) . Nj is the total number of packets assigned to the paths of type 
j by the suboptimal rate allocation algorithm. 77* is the asymptotically optimal rate allocation given in 
equation (fT9l) . Kj is also defined in the step (2) of the algorithm. The notation f(L) = o(g(L)) means 
lim^^oo ^| = 0. The proof can be found in appendix [j] 

The proposed algorithm is compared with four other allocation schemes over L = 6 paths in Fig. [8] 
The optimal method uses exhaustive search over all possible allocations. 'Best Path Allocation' assigns 
everything to the best path only, ignoring the rest. 'Equal Distribution' scheme distributes the packets 
among all paths equally. Finally, the 'Asymptotically Optimal' allocation assigns the rates based on 
equation (fT9l) . The block length and the number of information packets are assumed to be iV = 100 
and K = 90, respectively. The overall rate is S req = lOOOpkt/ sec which results in T = 100ms. The 
average good burst, p g , is selected such that we have \i g T = ~. However, quality of the paths are different 
as they have different average bad burst durations. Packet error probability of the paths are listed as 
[0.0175 ± f ,0.0175 ± ^,0.0175 ± such that the median is fixed at 0.0175. A is also defined as a 
measure of deviation from this median. A = represents the case where all the paths are identical. The 
larger is A, the more variety we have among the paths and the more diversity gain might be achieved 
using a judicious rate allocation. 

As seen, our suboptimal algorithm tracks the optimal algorithm so closely that the corresponding curves 
are not easily distinguishable over a wide range. However, the 'Asymptotically Optimal' rate allocation 
results in lower performance since there is only one path from each type which makes the asymptotic 
analysis assumptions invalid. When A = 0, 'Equal Distribution' scheme obviously coincides with the 
optimal allocation. This scheme eventually diverges from the optimal algorithm as A grows. However, it 
still outperforms the best path allocation method as long as A is not too large. For very large values of 
A, the best path dominates all the other ones, and we can ignore the rest of the paths. Hence, the best 
path allocation eventually converges to the optimal scheme when A increases. 

VI. Conclusion 

In this work, we have studied the performance of forward error correction over a block of packets sent 
through multiple independent paths. It is known that Maximum Distance Separable (MDS) block codes 
are optimum over our End-to-End Channel model, and any other erasure channel with or without memory, 
in the sense that their probability of error is minimum among all block codes of the same size [6], [7]. 
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Fig. 8. Optimal and suboptimal rate allocations are compared with equal distribution and best path allocation schemes for different values 
of A 

Adopting MDS codes, the probability of irrecoverable loss, Pe, is analyzed for the cases of a single path, 
multiple identical, and multiple non-identical paths based on the discrete to continuous relaxation. When 
there are L identical paths, Pe is upperbounded using large deviation theory. This bound is shown to be 
exponentially tight in terms of L. The asymptotic analysis shows that the exponential decay of Pg with L 
is still valid in the case of non-identical paths. Furthermore, the optimal rate allocation problem is solved 
in the asymptotic case where L is very large. It is seen that for the optimal rate allocation, each path 
is assigned a positive rate iff its quality is above certain threshold. The quality of a path is defined as 
the percentage of the time it spends in the bad state. Finally, we focus on the problem of optimum rate 
allocation when L is not necessarily large. A heuristic suboptimal algorithm is proposed which computes a 
near-optimal allocation in polynomial time. For large values of L, the result of this algorithm converges to 
the optimal solution. Moreover, simulation results are provided which verify the validity of our theoretical 
analyses in several practical scenarios, and also show that the proposed suboptimal algorithm approximates 
the optimal allocation very closely. 



Appendix A 



Proof of Lemma I 



1) We define the function v(X) as 



v(X) 



E{xe Xx } 
E{e Xx } ' 



(24) 



Then, the first derivative of v(X) will be 




E{x 2 e Xx }E{e 



} - [E{xe Xx }} 2 



(25) 



[E{e Xx }} 2 
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According to Cauchy-Schwarz inequality, the following statement is always true for any two functions of 

/() and g() 

f(x)g(x)dx I < / f 2 (x)dx / g 2 (x)dx (26) 



unless f(x) = Kg(x) for a constant K and all values of x. If we choose f(x) = ^/x 2 Q(x)e xX and 
g(x) = ^jQ(x)e xX , they can not be proportional to each other for all values of x. Therefore, the numerator 
of equation (1231) has to be strictly positive for all A. Since the function v (A) is strictly increasing, it has 
an inverse ir l (a) which is also strictly increasing. Moreover, the non-linear equation v(X) = a has a 
unique solution of the form X = v~ l (a) = 1(a). 

2) To show that I (a = 0) = — oo, we prove an equivalent statement of the form lim;^-^ v(X) = 0. Since 
x is a random variable in the range [0, 1] with the probability density function Q(x), for any < e < 1, 
we can write 

f„ e xQ(x)e xX dx + f 1 xQ(x)e xX dx 
lim v(X) = lim — t 

A ^-°° a^-oo Q(x)e xX dx 

C xQ(x)e xX dx f 1 xQ(x)dx 
< lim tt—, : — r~, V 



a^-oo £ Q(x)e xX dx ^Q{x)e { - X ~ e ) x dx 

(a) ,. f^xQ(x)e xX dx 
= hm -ji — — — 

a^-oo J o Q(x)e xX dx 

(6) xiQ(xi)e Axi 

= hm — — — r^r (27) 

a^-oo Q(x 2 )e Xx2 

for some x 1 ,x 2 E [0, e]. (a) follows from the fact that for x E [0, e], (x — e)A — > +oo when A — > — oo, 
and (6) is a result of the mean value theorem for integration [60]. This theorem states that for every 
continuous function f(x) in the interval [a, b], we have 



3 xq E [a, b] s.t. / f(x)dx—f(xo)[b — a). 

J a 



(28) 



Equation (1271) is valid for any arbitrary < e < 1. If we choose e — > 0, xi and X2 are both squeezed in 
the interval [0,e]. Thus, we have 

lim v(X) < lim lim Xl Q( x ^ e — _ ii ma; = o (29) 

A^-oo V /_ A^-ooe^0 Q(x 2 )e Xx2 ^0 



Based on the distribution of x, v(X) is obviously non-negative for any A. Hence, the inequality in ([29 
can be replaced by equality. 



3) By observing that v(X = 0) = E{x}, it is obvious that l(a = E{x}) = 0. 
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4) To show that l(a = 1) = +oo, we prove the equivalent statement of the form lim A _> +00 v(X) = 1. For 
any < e < 1 and x G [1 — e, 1], (x — l + e)A — > +oo when A — > +oo. Then, defining £ = 1 — e, we have 

fn xQ(x)e xX dx ff xQ(x)dx 
lim ^ VV ; < lim /° ^ 1 = 0. (30) 

A " + °° So ]Q{x)e xX dx a^+oo £ Q( x )e( x -0*dx 

Since the fraction in (l30l) is obviously non-negative for all A, this inequality can be replaced by an equality. 
Similarly, we have 

fa Q(x)e xX dx f!:Q(x)dx 
lim J o vv 1 < i im J o ^ ; = Q (31) 

A ^+°° J\ xQ(x)e xX dx a^+oo xQ(x)e { - x ~^ x dx 

which can also be replaced by equality. Now, the limit of v(X) is written as 

Jq xQ(x)e xX dx + J? xQ(x)e xX dx 
lim f (A) = lim ; 

A ^+°° A ^+°° j Q(x)e xX dx 

(a),. St xQW^d* 
= lim — 

A ^+°° / Q(x)e xX dx 
(6) / . io C Q{%)e xX dx + Q{x)e xX dx x 
\ v A ^+°° J C X xQ(x)e xX dx 

(c)/ , ScQ( x ) eXXdx ' 

- 1 hm 



A ^+°° J 1 xQ(x)e xX dx 



x\ A 



= ( lim ^ a J (32) 

for some xi,X2 G [1 — e, 1]. (a) follows from equation (|30l) . and (6) is valid since the final result shows 
that limA^+oo^(A) is finite and non-zero [60]. (c) follows from equation (I3TI) . and (d) is a result of the 
mean value theorem for integration. If we choose e — > 0, X\ and x 2 are both squeezed in the interval 
[1 — e, 1]. Then, equation (l32l) turns into 

lim v(X) =( lim lim ^ / , r ] = (lim — ] = 1. 

A^+oo yA^+ooe^O X 2 Q(x 2 )e X2X J X 2 J 

5) According to equations (fT2l) and (fT3l . the first derivative of u(a) is 

du(a) „ , 3Z(a) E{a;e Aa: } dZfcn) 7/ , 



25 



Appendix B 
Proof of Lemma II 

Based on the definition of probability density function, we have 



Km — r log(/„(a)) 



1 

L^oo L 

li m -Ilogflim P fe >a >- P fe >a + *> 



L^oo L 6 

{a) 1, fF{y>a}-F{y>a + 5} 

hm urn — — log 1 



S-* OL^oo L \ S 

> lim lim — (— log (F{y > a}) + log 5) 

= u(a) (33) 

where (a) is valid since log is a continuous function, and both limitations do exist and are interchangeable. 
(b) follows from equation (fl4)) . The exponent of f y (a) can be upper-bounded as 

lim --log(/ v (a)) 

L— >00 Li 

= iim lim - lQ g ( p fa > °} - F iv > a + 5 Y) + lQ g^ 

5-> OL^oo L 
(£>) — log C e - i («( Q )+< E ) _ e -L(u(a+8)-e)\ _|_ j Q g £ 

< lim lim 

5^ OL^oo L 

, logfl-e" 1 *) 
= lim lim u(a) + e - 

8^ OL^oo L 

( = ) M ( a ) + e (34) 

where x — u ( a + 5) — u(a) — 2e. Since u(a) is a strictly increasing function (Lemma I), we can make x 
positive by choosing e small enough, (a) is valid since log is a continuous function, and both limits do 
exist and are interchangeable, (b) follows from the definition of limit if L is sufficiently large, and (c) is 
a result of x being positive. Selecting e arbitrarily small, results (|33l) and (1341) prove the lemma. 

Appendix C 
Proof of Lemma III 

According to the definition of infimum, we have 

lim 

L^oo L 

1 / -iinf/i(x) f 
> lim — — log I e 5 / <ix 

L^oo L \ J s 

( = } inf/i(x). (35) 
s 
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where (a) follows from the fact that S is a bounded region. Since h(x) is a continuous function, it has a 
minimum in the bounded closed set cl(S) which is denoted by x*. Due to the continuity of /i(x) at x*, 
for any e > 0, there is a neighborhood £>(e) centered at x* such that any x G £>(e) has the property of 
|/i(x) — h(x*)\ < e. Moreover, since S is a convex set, we have vol (£>(e) PI S) > . Now, we can write 

L^OO Li 

< lim -4 log | / e" L/l(x) dx 



L— >00 Z/ 



SnB(e) 



< lim --log ( e~ L{h{ ^ )+e) I rfx 



SnB(e) 



= h(x*) + e. (36) 
Selecting e to be arbitrarily small, (1351) and d36l) prove the lemma. 

Appendix D 
Proof of Lemma IV 

According to Lemma I, Uj(x) is increasing and convex for VI < j < J. Thus, the objective function 
f(P) — Ylj=i Ti^j lf 1 ) 1S a l so convex, and the region St is determined by J convex inequality constraints 
and one affine equality constraint. Hence, in this case, KKT conditions are both necessary and sufficient 
for optimality [61]. In other words, if there exist constants (fij and v such that 

a* 

^-U^-) - <j>j - v = Vl<j<J (37) 
Vj Vj 

<Pj [vE{ Xj } - /?*] =0 VI < j < J (38) 

then the point (3* is a global minimum. 

Now, we prove that either [3* = i]j¥,{xj} for all 1 < j ' < J, or /3J > r/jEjxj} for all 1 < j < J. Let 
us assume the opposite is true, and there are at least two elements of the vector indexed with k and 
m, which have the values of (3^ = r] k E,{x k } and f3^ > r] m E{x m }, respectively. For any arbitrary e > 0, 
the vector (3** can be defined as below 



/3; + e ifj = fc 

p* - e if j = m (39) 
P* otherwise. 
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Then, we have 



lim 

hm - <^ 7fcM fc + j m u r 



0* 

Vk 



°Vk \ Vk / Vm \ Vm 

-lHLl m (Pk) <0 (40) 



Vm \ ^?ri 

where e',e" G [0, e], and (a) follows from the Taylor's theorem. Thus, moving from /3* to (3** decreases 
the function which contradicts the assumption of (3* being the global minimum. 

Out of the remaining possibilities, the case where (3* = i]jK{xj} (VI < j < J) obviously agrees with 
Lemma IV for the special case of v — 0. Therefore, the lemma can be proved assuming (3* > r]jE,{xj} 
(VI < j < J). Then, equation d38l) turns into <j)j = (VI < j < J). By rearranging equation (1371) and 
using the condition J2j=i Pj = a > Lemma IV is proved. 

Appendix E 
Proof of Theorem II 

Sketch of the proof: First, it is proved that 77* > if E{x,} < a. At the second step, we prove 
that 17* = 0, if E{x,} > a. Then, KKT conditions [61] are applied for the indices 1 < k < J where 
E{xfc} < a to find the maximizing allocation vector, rf . 

Proof: The parameter v is obviously a function of the vector ry. Differentiating equation (fT71) with 
respect to r\ k results in 



dv k \lk) Ik k \lk ) 



d Vk Jrf 



(41) 



71 j / ^3 

3 



where Vj(x) = l 3 - 1 (x), and v'j(x) denotes its derivative with respect to its argument. The objective function 
can be simplified as 

9(V) = E W$ ) = (42) 

i=1 % j= i V 7i / 

1/* is defined as the value of v corresponding to vf . Next, we show that u* > 0. Let us assume the 
opposite is true, i.e., v* < 0. Then, according to Lemma I, we have Uj(^p-) < E{x,,} for all j which 
results in g(ry*) = 0. However, it is possible to achieve a positive value of g(rj) by setting r)j = 1 for the 



2S 



one vector which has the property of E{xj} < a, and setting rjj = for the rest. Thus, 77* can not be the 
maximal point. This contradiction proves the fact that v* > 0. 

At the first step, we prove that r]* > if E{xj} < a. Assume the opposite is true for an index 
1 < k < J. Since J2j=i Vj = 1> there should be at least one index m such that rf m > 0. For any arbitrary 
e > 0, the vector r]** can be defined as below 



e if j = k 

r]* — e if j = m (43) 
7]j otherwise. 

v** is defined as the corresponding value of v for the vector t)**. Based on equation (14~TT) . we can write 

Au = 

z/** - v* = (44) 

1m J 7m \ 1m J ^ 0(e 2 ^ 

Vi , ( v Vi 



r- 



\li V 7j 



Then, we have 



g (n**) - g{rf 



lim 

Mm i ( ^ r^iv - r ^ 



* 7i 

^m^y^J -E{x fe }| (45) 

where (a) follows from (|44l) . If the value of (1431) is positive for an index m, moving in that direction 
increases the objective function which contradicts with the assumption of rj* being a maximal point. If 
the value of (1431) is non-positive for all indexes m whose rf^ > 0, we can write 

E{a; fc } > »?m«m (—) = « (46) 

which obviously contradicts the assumption of E{xfc} < a. 

At the second step, we prove that r/* = if E{sj} > a. Assume the opposite is true for an index 
1 < r < J. Since J2j=iVj = 1> we should have rf s < 1 for all other indices s. For any arbitrary e > 0, 
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the vector 77*** can be defined as 

if j = r 

rf. + e ]fj = s (47) 
rj* otherwise. 

v*** is defined as the corresponding value of v for the vector 77***. Based on equation (|4TT) . we can write 



Au = u***-u* 



E 



Then, we have 



? Ti V Tj 

^J^-^pjy+o^. (48) 



lim i r^ : ^v_^v, /Vtf. 



*°e i 7s 3 \ Is J 7r V 7r 

7, J V 7i ; 



W ,*M^V<U^)> (49) 



IT 

7r / V 7. 



where (a) follows from (|48l) . If the value of (|49l is positive for an index s, moving in that direction 
increases the objective function which contradicts with the assumption of 77* being a maximal point . If 
the value of (l49l) is non-positive for all indices s whose 77* > 0, we can write 



E{Xr} < Vr (^^j < J2v> s = a (50) 

which obviously contradicts the assumption of E{x r } > a. 

Now that the boundary points are checked, we can safely use the KKT conditions [61] for all 1 < k < J, 
where E{x fc } < a, to find the maximizing allocation vector, 77*. 



7fe V Ik J \ 7j / 



M -^f^| (51) 
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where £ is a constant independent of k, and (a) follows from (I4TT) . Using the fact that J2j=i Vj = 1 
together with equations (fTTT ) and (I5TT) results in 

C = -olW 

Combining equations (I5TI ) and (1521) results in equation (TT9b and (7(77*) = X^=i ljUj(a). 

Appendix F 
Proof of Remark V 

Based on the arguments similar to the ones in appendix [0 it can be shown that rj* = iff E,{xj} > a. 
Since all the types are identical here, this means 77J > for all j. Similar to equation (|5D) . applying KKT 
conditions [61], gives us 



-c if,7<^ T 



n 
l3 W 3 T 

where a/s are non-negative parameters [61]. Putting T = lj ^Zp proves equation (|20l) . 



(53) 



-C — <Jj if ?7* 



j " 1 j ~ 

n 



Appendix G 
Discrete Analysis of One Path 

Q(n,k,l) is defined as the probability of having exactly k errors out of the n packets sent over the path 
/. Depending on the initial state of the path /, P g (n,k,l) and Pb(n,k,l) are defined as the probabilities 
of having k errors out of the n packets sent over this path when we start the transmission in the good or 
in the bad state, respectively. It is easy to see that 

Q(n, k, I) = TT g Pg(n, k, I) + ir b Pb(n } k, I). (54) 

P g (n, k, I) and P&(n, k, I) can be computed from the following recursive equations 

P b (n, k,l)=ir b \ b P b (n - 1, k - 1,1) + 7tg\bP g {n - 1, k - 1,1) 

P g (n,k,l) = ir b \gP b (n - l,k,l) + ir g \gPg(n - l,k,l) (55) 
with the initial conditions 

P g (n, k,l) = for k >n 
P b (n, k, t) — for k > n 
P g (n,k,l) = Q forA;<0 

P b (n, k, I) = for k < (56) 
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where 7r S2 | Sl is the probability of the channel being in the state s 2 G {g, b} provided that it has been in 
the state s x G {g,b} when the last packet was transmitted. 7T S2 | S1 has the following values for different 
combinations of s% and s 2 [1] 

^sls = 7r 9 + vr 6 e $ 
7T6|6 = n b + n g e S l 

7i~ g \b = 1 - ^b\b (57) 

where denotes the transmission rate on the path I, i.e., the packets are transmitted on the path I every 
■A- seconds. 

According to the recursive equations in (1551) . to compute Pb(n,k,l) and P g (n,k,l) by memoization 
technique, the functions P&0 and P 9 () should be calculated at the following set of points denoted as 

S(n, k) 

S(n, k) = {(n', k') \ < k' < k, n - n + k < k' < n} . 

Cardinality of the set S(n, k) is of the order \S(n, k) \ — O (k(n — k)). Since three operations are needed 
to compute the recursive functions P b () and P g Q at each point, P&(n, k, I) and P g (n, k, I) are computable 
with the complexity of O (k(n — k)) which give us Q(n, k, I) according to equation (l54l) . 

Appendix H 
Discrete Analysis of One Type 

When there are n packets to be distributed over Lj identical paths of type j, uniform distribution is 
obviously the optimum. However, since the integer n may be indivisible by Lj, the Lj dimensional vector 
N is selected as 



Ni 



n 



L— J +1 for 1 < I < Rem(n, Lj) 



Tl 

[ — J for Rem(n, Lj) < I < Lj 



(58) 



where Rem(a, b) denotes the remainder of dividing a by b. N represents the closest integer vector to a 
uniform distribution. 

P N (/c,Z) is defined as the probability of having exactly k erasures among the n packets transmitted 
over the identical paths 1 to / with the allocation vector N. According to the definitions of Qj(n, k) and 
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E N (fc, /), it is obvious that Qj(n, k) = E N (k, Lj). E N (k, I) can be computed recursively as 

k 

E N {k,l) = ^£ N (A;-M-l)Q(i\M,/) 

i=0 

E N (k,l) = Q(N u k,l) (59) 

where Q(Ni, i, I) is given in appendix iGl Since all the paths are assumed to be identical here, Q(Ni, k, I) 
is the same for all path indices, /. According to the recursive equations in (1551) . the values of Q(Ni,i,l) 
for all < i < k and 1 < I < Lj can be calculated with the complexity of 0(N[k) = O (f~^)- 
According to the recursive equations in (l59l) . computing E N (k, I) requires memoization over an array of 
size 0(kl) whose entries can be calculated with 0(k) operations each. Thus, E N (k, I) is computable with 
the complexity of 0(k 2 l) if Q(Ni, i, Z)'s are already given. Finally, noting that Qj(n, k) = E N (k, Lj), we 
can compute Qj(n, k) with the overall complexity of 0(k 2 Lj) + O {^k^j. 

Appendix I 
Proof of Lemma V 

The lemma is proved by induction on j. The case of j — 1 is obviously true as P e (n, k, 1) = 
P° pt (n, k, 1). Let us assume this statement is true for j = 1 to J — 1. Then, for j = J, we have 

P e (n, k, J) 

(a) Nj 

< Y,Qj(N°j Pt ^)Pe(n-Nf,k-z,J-l) 

< ^2Qj(Nf,i)P^(n-NT,k-i,J-l) 
( ) Nj 

( = } Pr\k,J)=P e ° pt (n,k,J) 

where N op * denotes the optimum allocation of n packets among the J types of paths such that the 
probability of having more than k lost packets is minimized, (a) follows from the recursive equation (|2TT) . 
and (b) is the induction assumption, (c) comes from the definition of P° pt (n, k, I), and (d) is a result of 
equation (T23T) . 

Appendix J 
Proof of Theorem III 

Sketch of the proof: First, the asymptotic behavior of Qj(n, k) is analyzed, and it is shown that for 
large values of Lj (or equivalently L), equation (1631) computes the exponent of Qj(n, k) versus L. Next, 
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we prove the first part of the theorem by induction on J. The proof of this part is divided to two different 
cases, depending on whether ^ is larger than E{xj} or vice versa. Finally, the second and the third parts 
of the theorem are proved by induction on j while the total number of path types, J, is fixed. Again, the 
proof is divided into two different cases, depending on whether j- is larger than E{x, } or vice versa. 

Proof: First, we compute the asymptotic behavior of Qj(n,k) for k > nE{xj}, and n growing 
proportionally to Lj, i.e. n = n'Lj. Here, we can apply Sanov's Theorem [56], [62] as n and k are 
discrete variables and n' is a constant. 

Sanov's Theorem. Let X 1 , X 2 , . . . , X n be i.i.d. discrete random variables from an alphabet set X 
with the size \X\ and probability mass function (pmf) Q(x). Let V denote the set of pmf's in W x \ i.e. 
V = jp G Rl^'l P(i) > 0, Y)i=i p 00 = l}- Also ' let Pl denote the subset of V corresponding to all 
possible empirical distributions of X in L observations [62], i.e. V L = {P G V\ Vi, LP{i) G Z}. For any 
dense and closed set [57] of pmf's E C V, the probability that the empirical distribution of L observations 
belongs to the set E is equal to 

F{E} = ¥{Ef]V L } = e- LD{F * llQ) (60) 

where P* = argmin7J(P||Q) and D(P||Q) = ElS p W log 

Focusing our attention on the main problem, assume that P is defined as the empirical distribution of 
the number of errors in each path, i.e. for Vi, 1 < i < n', P{i) shows the ratio of the total paths which 
contain exactly i lost packets. Similarly, for Vi, 1 < i < n', Q{i) denotes the probability of exactly i 
packets being lost out of the n' packets transmitted on a path of type j. The sets E and E out are defined 
as follows 



E = {¥ eV\^iP{i)>f3} (61) 

i=0 
n' 

E out = {PeV\J2^) = P} 

i=0 

k 

where (3 = —. Noting E and E out are dense sets, we can compute Qj(n, k) as 

71/ 

(a) W ~Lj min ^(PIIQ) 

Q J (n,k)^F{E out } = e p ^ out (62 ) 

where (a) follows from the definition of Qj(n, k) as the probability of having exactly k errors out of the 
n packets sent over the paths of type j given in section |V] and (b) results from Sanov's Theorem. 

Knowing the fact that the Kullback Leibler distance, £>(P||Q), is a convex function of P and Q [63], 
we conclude that its minimum over the convex set E either lies on an interior point which is a global 
minimum of the function over the whole set V or is located on the boundary of E. However, we know 
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that the global minimum of Kullback Leibler distance occurs at P = Q ^ E. Thus, the minimum of 
D(P||Q) is located on the boundary of E. This results in 

(a) —Lj min £>(P||Q) 

Qj(n,k) = e p e#out 

k 

= e PeS = e n (63) 

where (a) and (b) follow from equations (l62l) and (TT4l) . respectively. 



1) We prove the first part of the theorem by induction on J. When J = 1, the statement is correct for both 
cases of § > E{x ± } and § < E{xx}, recalling the fact that P e (n, k, 1) = P° pt (n, k, 1) and u x (x) = 
for x < E{xi}. Now, let us assume the first part of the theorem is true for j = 1 to J — 1. We prove the 
same statement for J as well. The proof can be divided into two different cases, depending on whether 
is larger than E{sj} or vice versa. 

1.1) ^ > E{xj} 

According to the definition, the value of P e (N, K, J) is computed by minimizing XT=o Qj( n Ji i)Pe{.N— 
nj, K — i, J — 1) over nj (see equation (|23l ). Now, we show that for any value of n,j, the corresponding 
term in the minimization is asymptotically at least equal to P° pt (N, K, J), nj can take integer values 
in the range < nj < N. We split this range into three non-overlapping intervals of < nj < eL, 
eL < nj < N(l — e), and N(l — e) < nj < N for any arbitrary constant e < min {7^, 1 — The 
reason is that equation (|63l) is valid in the second interval only, and we need separate analyses for the 
first and last intervals. 

First, we show the statement for eL <nj < N{\ — e). Defining ij = [njj^\, we have 

nj N + { L>' 



K ~' J A ' + 0(i) (64) 



iV-nj N y L 
as e is constant, and K = O(L), N = O(L). Hence, we have 

QAnj, *)Pe(N - nj, K - i, J - 1) 

i=0 

> Qj(nj,tj)P e (N -nj,K -ij, J- 1) 
J 

L 



(a) 

= e J 

j 



3=1 v 7 



e J =1 (65) 
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where (a) follows from (|63l ) and the induction assumption, and (b) follows from the fact that UjQ's are 

differentiable functions according to Lemma I in subsection IIV-B1 

For < nj < eL, since e < jj, the number of packets assigned to the paths of type J is less than the 

number of such paths. Thus, one packet is allocated to nj of the paths, and the rest of the paths of type 

J are not used. Defining itb,j as the probability of a path of type J being in the bad state, we can write 

1 



-nj log 



Qj(nj,nj)=i$ t J j = e V 7 ^/ . (66) 
Therefore, for < nj < eL, we have 

nj 

QAnj, i)Pe{N - nj ,K-i,J-l) 



i=0 

> Qj(nj,nj)P e {N-nj,K-nj,J-l) 
j-i 



K-nj 



e ^ 
j-i 



N-rij) \TT bjJ 
1 



L J2^ u i ( 

3=1 V 7 

3=1 V 7 > P 3=1 V 7 



-l > 7,m 7 — — Lelog , 
> e J 

-T, 

(b) 



= e i =1 > e (67) 

where (a) follows from the fact that ^~^ J < and (6) results from the fact that we can select e arbitrarily 



K-nj_ ^ K 

small. 



Finally, we prove the statement for the case nj > N(l — e). In this case, we have 

QAnj, i)P e (N -nj,K-i,J-l) 

i=0 

> Qj(nj,K)P e (N-nj,o,J-i) 

K 



(a) -L^jUj 

> e 

j 



N 1-e 



.7 = 1 V 7 



(*>) 

> e • 7=1 x (68) 

where (a) follows from the fact that e < 1 — 4£ and P e (n, 0,j) = 1, for all n and j. Setting e small 
enough results in (6). 

Inequalities (|65l). (|67|) . and d68]) result in 

j 

P e (N,K,J) > e i= l (69) 
Combining (l69l) with Lemma V proves the first part of Theorem III for the case when ^ > ~E{xj}. 
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1.2) ^ < E{xj} 

K 

Similar to the case of — > E{sj} in subsection 1.1, we show that for any value of < rij < N, the 
corresponding term of the minimization in equation (|23T) is asymptotically at least equal to P° pt (N, K, J). 
Again, the range of nj is partitioned into three non-overlapping intervals. 

For any arbitrary < e < min {7,7, 1 — ^, -^}, and for all rij in the range of eL < rij < N(l — e), 
we define % j as ij = \njE{xj}] . We have 

^ = E{xj} + o(j/\> E{xj} 
K - %3j <$ + o(l) (70) 



N-nj N 



Hence, 



QAnj, i)Pe{N -nj,K-i,J-l) 

i=0 

> Qj(nj,ij)P e (N -nj,K -ij,J - 1) 
= e J_i 



W -Ljjuj [E{xj} + O (j^J 



3- 

J 



K 

■ ^ J " J I A 7 



-L ^ ljUi 



= e ■ J=1 (71) 

where (a) follows from (|63l ) and the induction assumption, and (6) is based on (|70l) . (c) results from the 
facts that UjQ's are differentiable functions, and we have Uj (E{xj}) = 0, both according to Lemma I in 
subsection HV-Bl 

For < nj < eL, the analysis of section 1.1 and inequality (l67l) are still valid. For nj > (1 — e)N, we 
set ij = [E {xj} nj] . Now, we have 

U > njE{xj} > (1 - e)NE{xj} > (1 - e)K. (72) 

The above inequality can be written as 

K - i j < eK < 1 (73) 



37 



since e < j^. Noting that K and ij are integer values, it is concluded that K < ij. Now, we can write 

QA^J, i)Pe(N - nj, K - i, J - 1) 

i=0 

> Qj(nj,ij)P e (N-nj,K-ij,J-l) 
= Qj(nj,ij) 

-Ljjuj ( E {xj} + — 

> e V ™j 



-LjjUj(e{xj} + 0( \ ]) (c) 



e V V L // = 1 (74) 

where (a) follows from the fact that K < ij, and P e (n,k,j) = 1, for k < 0. (6) and (c) result from 
nj > (1 — e)iV and «j (E{xj}) = 0, respectively. 
Hence, inequalities (1671) . d7TI) . and (|74|) result in 

j 

P e (N,K,J) > e i =1 (75) 
which proves the first part of Theorem III for the case of -j| < E{a;j} when combined with Lemma V. 



2) We prove the second and the third parts of the theorem by induction on j while the total number of 
types, J, is fixed. The proof of the statements for the base of the induction, j = J, is similar to the proof 
of the induction step, from j + 1 to j. Hence, we just give the proof for the induction step. Assume the 
second and the third parts of the theorem are true for m = J to j + 1. We prove the same statements for 
j. The proof is divided into two different cases, depending on whether -j| is larger than Ej^j} or vice 
versa. 

Before we proceed further, it is helpful to introduce two new parameters N' and K' as 



N' = N- #j 

m=j-\-l 
J 

K> = K- K r 



m=j+l 

According to the above definitions and the induction assumptions, it is obvious that 

^ = ^ + o(l)=a + o(l). (76) 
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2.1) ^ > E{ Xj } 

First, by contradiction, it will be shown that for small enough values of e > 0, we have Nj > eN'. Let 
us assume the opposite is true, i.e. Nj < eN'. Then, we can write 

P e (N',K',j) 

( = } ^PtiN'-N^K'-iJ-VQjiN^i) 

i=0 

> P e {N'-Nj,K'-Nj,j-l)Qj{Nj,Nj) 
j'-i 



K' - Nj 

Qj{Nj,Nj)e > \ N '~ N i 




J W 



-L 7 r -u r (a) 



> e r=i (77) 

where (a) follows from equation (|23l) and step (2) of our suboptimal algorithm, (6) results from the first 
part of Theorem HL and (c) can be justified using arguments similar to those of inequality (|67|) . (d) is 
obtained assuming e is small enough such that the corresponding term in the exponent is strictly less than 
L'jjUj (jip) and also the fact that ^ = a + o(l). The result in (1771) is obviously in contradiction with the 
first part of Theorem III, proving that Nj > eN'. 

Now, we show that if Nj > (1 — e)N for arbitrarily small values of e, we should have E {x r } > a for 
all 1 < r < j — 1. In such a case, we observe -^7 = 1 + o(l), proving the second statement of Theorem 
III. To show this, let us assume Nj > (1 — e)iV. Hence, 

P e (N', K', j) = Pe(N' - Nj, K' -i, 3 - l)Qj(Nj, i) 

i=0 

>P e (N'-Nj,0,j-l)Qj(Nj,K') 

> e ' L ^ u i{ (1 - c)N i ) -L. e ~L ljUj {a+o{l)) ^yg^ 

where (a) follows from the fact that P e (n, 0, j) = 1, for all values of n and j, and the fact that Nj > 
(1 — e)N'. (b) is obtained by making e arbitrarily small and using equation (l76l) . Applying (|78T) and 
knowing the fact that P e (N',K',j) = e " L ^'=i 7r " r(a) , we conclude that E{x r } > a, for all values of 
1 < r < j - 1. 
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P e (N',K',j) can be written as 

Pe(N',K',j) 



= mm 



0<N,<N' 



i=0 



(a) 



(b) 



mm max 

eN'<Nj<(l-e)N' 0<i<Nj 

P e (N' - Nj, K' - l)Qj(Nj, i) 



mm max 

eN'<N J <(l-e)N' E{xj}Nj <i<Nj 



3-1 



~ LljUj ( W j ~ L S 7rMr 



A/"' - AT,- 



max 



mm 



eN'<Nj<(l-e)N' E{xj}Nj<i<N. 



Mi{i,Nj) 



( c ) — L max min M c (8j,\j) 



(79) 



where Md(i,Nj) and M c (0j,Xj) are defined as 

M d {i,Nj) = ljUj 



z 

A, 



r=l 
r=l 



K' 



N - N, 



a — 3j 



In (f79l . (a) follows from the fact that A^ is bounded as eN' < Nj < (1 - e)N'. (b) results from 
equation (|63l) . P e (n,k,j) being a decreasing function of fc, and the fact that we have Qj(Nj,i) < 1 = 
Qj(Nj,'E{xj} Nj) for z < E{x,} Nj. 3j and Aj are defined as Qj = jp- and Xj = jft. (c) is a result of 
having M c (3j, Xj) = M^(z, A/j) + O (r). Hence, the discrete to continuous relaxation is valid. 

Let us define (/?*, A*) as the values of (0j, Xj) which solve the max-min problem in (|79l) . Differentiating 
M c (Pj, Xj) with respect to 0j and Xj results in 



(0 



E{x r .}<C 



J\* 2 J 



E{a; r }<C 



+ 



X* 



3-1 



\ 



E T 

r=l, 
E{x r }<C 



7r 



;*r (0 



801 



dXj {Xj - x *i 
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where ( = - Solving the above equations gives the unique optimum solution A*) as 

1 — A„- " 



i 

r=l,Q>E{x r } 



A* = (80) 



Hence, the integer parameters Kj, Nj defined in the suboptimal algorithm have to satisfy ^ = (3* + o(l] 
and ^7 = A* + o(l), respectively. Based on the induction assumption, it is easy to show that 



^ IrUr(a) 
N' r=l,E{x r }<a 

~N = J 

r=l,E{x r }<a 



(81) 



which completes the proof for the case of E {xj} < 



K_ 

N ■ 



2.2) ^ < E{ Xj } 



In this case, we show that = o(l). Defining ij = \~E{xj}Nj~\, we have 

K'-u , , & 



J - a- (E{x i } -a) 3 — + o(l) (82) 



N'-Nj N'-Nj 



using equation (1761) . Now, we have 

Pe(N',K',j) 

= J2 P ^ N '-tij,K'-i,j-l)Q 3 (K 



3,V 



i=0 



> P e (N' - Nj, K' - - l)Qj(Nj, ij) 
2 -L ljUj (Efo} + o(l)) . 

— l 7 r u r [a — (E{x,- } — a) 

e ^ V ^'-^i. 

—Ly ^ r u r \a — (K\xj \ — a) — 

- P H v n '- n j 



(83) 



where (a) follows from the first part of Theorem III and (1631) . On the other hand, according to the result 
of the first part of Theorem III, we know that 

i-i 

-L ^2 lrU r (a) 

P e (N',K',j)=e r=i . (84) 
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According to Lemma I, u r {0) is an increasing function of (3 for all 1 < r < j — 1. Thus, Y^ r =i jrUr (P) 
is also a one-to-one increasing function of (3. Noting this fact and comparing (|83l ) and (|84l ), we conclude 
that $ = o(l) as E{xj} — a is strictly positive. Noting (18TI) . we have = o(l) which proves the second 
part of Theorem III for the case of |C < K{xj}. 
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